Title: Clang versus GCC: binary size
Date: 2017-07-08
Modified: 2017-07-08
Category: Blog
Slug: clang-vs-gcc-binary-size
Tags: c, clang, gcc
Summary: In which I compare sizes (of binaries) and find, as ever, that mileages vary

Currently, there is hardly a less resolvable argument than the old 'GCC versus
Clang' one, unless you count Emacs versus Vim (or perhaps Linux-based versus
BSD-based). The reasons for this range from the constant work on both compilers
to try and one-up each other, the fact that much of the debate is social rather
than technical, and the fact that benchmarks frequently lie or aren't
representative of your particular use case. Despite this, the argument
continues, and several sites (such as [Phoronix][9]) do regular performance
shoot-outs between the two compilers whenever one or the other has a new 
release.

These shoot-outs tend to be focused around *speed*, and to nobody's great 
surprise, are both highly-varied and tend to be quite narrow margin-wise.
Recently, however, I've been looking into compiling the smallest possible
binaries. There are several reasons for this in my particular case, but in
general, small *is* beautiful, and can even be faster. Due to mostly social
issues (availability of tooling), I tend to prefer Clang, and seemingly just to
please obsessives like me, it even has an ``-Oz`` optimization option, designed
to produce even smaller binaries than ``-Os`` would. On this basis, I've been
using it for my own work up to now. However, I'm not a diehard Clang fanboy, and
have switched between the two several times in the last few years.

Not too long ago, I decided to try and compile the smallest possible
decompressor (for reasons that aren't too significant to any of this),
eventually finding [miniLZO][5]. However, in the process of doing this testing
with various decompressors, I found that Clang with ``-Oz`` did not necessarily
produce the smallest binary; in some cases, GCC would win, and by a non-trivial
amount, even though it has no corresponding ``-Oz`` flag. A little light
searching showed me that [others][1] had stumbled across [similar outcomes][2].
With this new information, I decided that some testing of it might be in order.
This is partly to see how much of a difference it can make (turns out that the
answer is 'quite a bit'), and whether GCC or Clang tend to do better (turns out
that the answer is 'Clang, but sorta-kinda'); but it is also to bring some
attention to this, and hopefully get some answers from people who know a bit
more about both compilers (and maybe to even see some improvements, who knows).

Coming from a position of relative ignorance, I can't really speculate *why* any
of my results came out the way they did. However, I hope that maybe someone
might be able to chime in and tell me. With that in mind, here is my experiment
and what happened.

## Setup ##

As per Morgan Spurlock, there shall be a few rules:

1. Only pure C will be considered (I have no interest in C++).
2. GCC and Clang will both be given identical optimization flag sets.
3. Binary size will be measured using ``wc -c`` (i.e. the number of bytes in the
   resulting executable or library).
4. I will test using Sebastian (my Arch main work machine), using GCC version 
   7.1.1 (as per ``gcc --version``) and Clang version 4.0.1 (as per ``clang
   --version``).

The particular codebases I chose are as follows:

* The [Lua][3] interpeter and static library, version 5.3.4
* The [Chibi Scheme][4] interpreter and dynamic library, commit ``8589333``
* [miniLZO][5] test program, version 2.10
* [MuPDF][6] executable, version 1.11 (for a real OS, not Windows)
* [cmus][7] executable, version 2.7.1
* [SQLite][8] amalgamation, version 3190300

I chose these for several reasons: they're the kind of thing you'd want to make
small, they are of varying sizes and do different things, and I happen to like
them. For each of these, I will compile them using both GCC and Clang, given the
rules above, and check their byte counts. As a further test, I will also compile
them with Clang using ``-Oz`` to see if this makes any difference.

## Results ##

So, without further ado, here are the results. I've **bolded** the best result.

| Codebase            | ``gcc -Os``  | ``clang -Os`` | ``clang -Oz`` | 
| ------------------- | -----------  | ------------- | --------------| 
| Lua interpreter     | **212560**   | 215432        | 221472        |
| Lua statlib         | 368324       | **366828**    | 369980        |
| Chibi interpreter   | 202768       | 62752         | **60376**     |
| Chibi dynalib       | 867520       | 655552        | **620360**    |
| miniLZO tester      | 14384        | **14360**     | **14360**     |
| MuPDF executable    | **35488120** | 35858928      | 35858928      |
| cmus executable     | **344016**   | 344816        | 344752        |
| SQLite amalgamation | **646704**   | 726272        | 697872        |

Here is the same data, as a delta from the best:

| Codebase            | ``gcc -Os``  | ``clang -Os`` | ``clang -Oz`` | 
| ------------------- | -----------  | ------------- | --------------| 
| Lua interpreter     | 0            | 2872          | 8912          |
| Lua statlib         | 1496         | 0             | 3152          |
| Chibi interpreter   | 142392       | 2376          | 0             |
| Chibi dynalib       | 247160       | 35192         | 0             |
| miniLZO tester      | 24           | 0             | 0             |
| MuPDF executable    | 0            | 370808        | 370808        |
| cmus executable     | 0            | 800           | 736           |
| SQLite amalgamation | 0            | 79568         | 51168         |

And the same deltas, only this time as percentages of the best:

| Codebase            | ``gcc -Os``  | ``clang -Os`` | ``clang -Oz`` | 
| ------------------- | -----------  | ------------- | --------------| 
| Lua interpreter     | 0%           | ~1.4%         | ~4.2%         |
| Lua statlib         | ~0.4%        | 0%            | ~0.9%         |
| Chibi interpreter   | ~235.8%      | ~4%           | 0%            |
| Chibi dynalib       | ~39.8%       | ~5.7%         | 0%            |
| miniLZO tester      | ~0.2%        | 0%            | 0%            |
| MuPDF executable    | 0%           | ~1%           | ~1%           |
| cmus executable     | 0%           | ~0.2%         | ~0.2%         |
| SQLite amalgamation | 0%           | ~12.3%        | ~7.9%         |

## Analysis ##

At a glance, we basically have every possible outcome represented here: in some
cases, GCC is the best (Lua interpreter, SQLite amalgamation, MuPDF and cmus); in
some cases, Clang with ``-Os`` is (Lua static library); in some cases, Clang
with ``-Oz`` is (Chibi in general); and sometimes, both ``-Os`` and ``-Oz`` are
the same (miniLZO tester). The differences between Clang and GCC also vary,
ranging from a difference of less than a percentage point (cmus) to *over three
times bigger* (Chibi interpreter). However, when Clang loses, it doesn't lose as
badly as GCC: if we compare ``-Os`` performance, Clang only added about 12% at
worst (about 8% if we look at ``-Oz``), while GCC inflated the Chibi interpreter
by a hilarious 235% or so. The results also shows that ``-Oz`` isn't always better 
than ``-Os``; however, the difference is pretty marginal (less than 1% in all cases). 

There does not appear to be any obvious reason for these differences, although
it is broadly true that Clang appears to compile slightly larger executables in
most cases. However, GCC suffers the biggest blowup in size on an executable
(and a pretty thin one actually, considering that most of Chibi's interpreter
functionality comes from its dynamic library). Library-wise, Clang appears to be
better, but as I only tested two libraries (one static, one dynamic), this
result may not be typical or representative.

It's worth mentioning that these tests were done only on x86\_64 (because that's
what Sebastian happens to be), and on such platforms, space is rarely an issue.
What these results would look like on platforms where space *could* be of
concern (such as ARM) is unknown (although I might end up doing that on
something like my microserver or tablet just to see what happens). Additionally,
I didn't compare these for speed with the typical settings used by each of these
projects (usually ``-O2``); it would be instructive to do this, but this
requires considerably more complex experimental design, to which I am currently
not up.

## Conclusion ##

Overall, we have a tie, but at least for these cases, I believe GCC loses. The
worst inflation for Clang is only between 8 and 12 percent, while for GCC, it's
over *threefold*; additionally, GCC's worst blowup is on a very simple bit of
code, but Clang's worst one is on a fairly complex single-file amalgamation.

What I believe this highlights above all is that you should test your
assumptions thoroughly. Unless you're feature-bound to a single one of GCC or
Clang (and the number of such features is relatively small, and tends to impact
very few projects), you should try both, with different settings, and see which
gives the best results. This has been known to be true of optimizations for
*speed* for a very long time; these results suggest that much the same can be
said for size.

This also highlights the importance of writing portable code. If you don't rely
on features that are unique to a specific compiler (or that step outside the
standard generally), you can always switch to another one if you find it
produces better results, whether on your particular platform or in general. You
can chalk this up to the benefits of portability, in addition to everything else
that this brings. While GCC and Clang are neck-in-neck with regards to a lot of
things, this can only be said in general: your specific results may vary, and
from what I've seen here, potentially by a whole damn lot.

[1]: https://github.com/android-ndk/ndk/issues/21
[2]: https://github.com/android-ndk/ndk/issues/133
[3]: https://www.lua.org/ftp/
[4]: https://github.com/ashinn/chibi-scheme
[5]: http://www.oberhumer.com/opensource/lzo/#minilzo
[6]: https://mupdf.com/downloads/
[7]: https://cmus.github.io/
[8]: https://sqlite.org
[9]: http://phoronix.com/scan.php?page=home
